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Dempster-Shafer Evidence Theory and Study of 
Some Key Problems 


Ying-Jin Lu and Jun He 


Abstract—As one of the most important 
mathematical methods, the Dempster-Shafer (D-S) 
evidence theory has been widely used in date fusion, risk 
assessment, target identification, knowledge reasoning, 
and other fields. This paper summarized the 
development and recent studies of the explanations of 
D-S model, evidence combination algorithms, and the 
improvement of the conflict during evidence 
combination, and also compared all explanation models, 
algorithms, improvements, and their applicable 
conditions. We are trying to provide a reference for 


future research and applications through this 
summarization. 

Index Terms—Combination arithmetic, conflict, 
Dempster-Shafer (D-S) evidence theory, evidence 


combination. 


1. Introduction 


The Dempster-Shafer (D-S) evidence theory is based on the 
work of Dempster"! during the 1960s and successfully 
extended by Shafer’, D-S evidence theory is an uncertainty 
reasoning method and it decomposes the entire problem into 
several subproblems, sub evidences, and then uses the 
evidence combination rule to get the solution of the 
problem. Conventional probability theories based on the 
probability theory and mathematical statistics argue that the 
probability is only determined by the frequency of the 
incident completely (evidence), but not related to people’s 
preferences. The probability is purely objective. Bayes 
subjective probability theory argues that the probability is a 
measure of people’s preferences or subjective intention. But 
Bayes subjective probability theory only focuses on 
human’s judgment and ignores the objective evidence, so 
the probability is purely subjective. D-S evidence theory 
requests emphasize the objectivity of evidence and the 
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people’s preferences both during probability inference. 

Generally, D-S evidence theory differs from traditional 
probability theories which distinguish ignorance and 
uncertainty explicit in the evidence combination process. 
Furthermore, D-S theory allows assigning a probability to not 
only singletons but also a set of multiple alternative 
elements", These unique characteristics make D-S theory 
particularly suit for designing and implementing complex 
systems"! and it has been widely used in information fusion, 
target identification, fault diagnosis, and other fields for this 
flexibility in evidence polymerization. 

The theory of evidence is often interpreted as an extension 
of the Bayesian theory of probabilities; however, it has also 
inspired several models of reasoning under uncertainty, which 
do not require the probabilistic view. In this section, we 
introduce some basic concepts of D-S evidence theory. 

Let © be a finite nonempty set called the frame of 
discernment, or simply the frame. © is composed of a series of 
mutually exclusive objects, and all the objects to be identified 
should be included, that is O={6,, 0», ++, 6}, where the object 
0; is the conclusion the system should make. There are three 
important functions in D-S theory: basic probability assignment 
function (BPA), belief function (Bel), and likelihood function 
(Pry. 

Basic probability assignment function: Assuming the 
discriminate framework © is known, how to determine the 
degree of an uncertain element belongs to a subset of ©. For 
every subset of ©, a probability can be assigned, which is 
called the basic probability assignment. The definition is as 
follows: 


X ma) =1, m($)=0. (1) 
AcO 

The set ¢ means a contradiction which cannot be true in 
any state, so assign m(¢) to be 0. 

Belief function: The belief function denotes the total degree 
to which a grade of the information is supported by the 
obtained evidence. For grades 4 and B satisfying BCA, ACO, 
and BCO, define the following function: 


Bel : 2° > [0,1], Bel(A) = >P m(B) (2) 
BEA 


where Bel is the belief function of ©. 
Likelihood function: The likelihood function denotes the 
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degree to which the grading cannot be rejected by the obtained 
evidence. Given a map Pl: 2°—[0, 1], it is defined as 
PI(A) = 1 —Bel(A) = > m(B). (3) 
ANB#6 
According to (2), it is easy to derive that the quantity of 
plausibility of A is equal to the sum of the masses of B, whose 
intersection with A is not empty, as shown in (4). For all ACO, 
Bel(A) forms a lower bound for A that could possibly happen, 
and PI(4) forms an upper bound for A to be happen, which is 
given by (5). 


P(A)= $, m(B) (4) 
BIBNA¥6 
Bel(A) < P(A) < PI(A). (5) 


Given independent belief functions over the same frame of 
discernment, we can combine the belief into a common 
agreement concerning a subset of 2° and quantify the conflicts 
using Dempster’s rule of combination". Given two masses 
m, and m, this combination computes a joint mass for the 
two pieces of evidence under the same frame of 
discernment. Dempster’s rule is calculated as follows: 


m2(A) = | X m vomira][x 


YıNY =A 


where 


K=1- J) m(%)m(%). (6) 


YınY=¢ 


The Ķ represents the conflict measure of the two belief 
functions. Whenever two or more functions are combined, the 
combination rule is associative and commutative. 

D-S evidence theory makes off “uncertainty” and “do 
not know” accurately, it is more accord with our daily 
behavior, so D-S evidence theory is practicable in engineering. 
It has been widely used in date fusion, risk assessment, target 
identification, knowledge reasoning, and other fields. However, 
several key problems of the evidence theory have not reached 
consensus, which restricts its further application and 
development. In recent years, scholars have made a lot of work 
on the explanation of D-S evidence theory, how to improve the 
evidence synthesis rules, and how to avoid the paradox during 
evidence synthesis. Although some scholars have reviewed 
these studies, their reviews are most about the explanation of 
D-S evidence theory and how to improve the evidence 
synthesis rules, not including the conflict comes from both 
synthesis rules and the source of evidences. And in the process 
of evidence synthesis, focal elements explosion often brings 
huge amount of calculation, but previous studies have not paid 
attention to this point. In this paper, we are trying to make a 
review on the research of the explanation of D-S evidence 
theory, the algorithms of evidence combination, and the 
conflict during evidence combining combined with the recent 


research. 


2. Explanation of D-S Evidence Theory 


Ever since Shafer put forwards the framework of D-S 
theory in [3], many scholars have tried to explain the basic 
concepts that Shafer ignored, but unfortunately, no one is 
acknowledged by all scholars. There are four main explanations 
now: upper and lower probability interpretation, general 
Bayesian, random decoder model, and transferable belief 
model. 

Upper and lower probability interpretation model”! Given a 
probability space (©, £, p), p is a probability measured on (©, £), 
and £ is a set of ©. If we define p, and p* for extending p to 2°, 
there are p*(A)>p.(A), p*(A)=1-p,.(A), then A is a 
measurable set if and only if p* (A) = p.(A) = p(A)®™"". It is 
not hard to see that the concepts are exactly similar and the 
belief function and likelihood function are both defined on the 
decision space. This explanation model can be used even the 
prior knowledge does not meet the probability of additive. But 
the shortage is that upper and lower probability interpretation 
model cannot explain the combinational rule of D-S theory, and 
the lower probability function does not satisfy the definition of 
the belief function. 

General Bayesian model: When all focal elements meet the 
independence condition of Bayesian theory, the D-S 
combination formula is degraded as a Bayesian formula, that is 
to say, the Bayesian formula is a special case of D-S synthetic 
formula, all data fusion using Bayesian formula can be used 
instead of D-S formula. The D-S method satisfies the weaker 
probability requirement, so the fusion result is often superior to 
Bayesian method. 

Random decoder model: In order to explain the belief 
function, Shafer and Tversky'” proposed a random decoder 
model. In this model, all evidence corresponds with a preset 4 
and probability p, if we judge evidence B is true, we need to 
preset p(c\B)=p(c)), ci©4. The unreasonable assumption that 
the evidence do not change the probability distribution of 4 was 
criticized by Levi!" and Smets and Kennes'*""*!, The random 
decoder divides all evidence into reliable evidence and 
shaky evidence accordance with peoples’ intuition, but for 
complex situations, the decoder is not intuitive!” The 
above three kinds of models are based on the probability 
theory!" 

Transferable belief model: In order to solve the problem of 
the preset of the random decoder model, Smet and Kennes! 
studied the reliability updating of the D-S model, and put 
forward the transferable belief model. This model presets the 
evidence is insufficient. The transferable belief model 
distinguishes two deferent levels: faith level and decision- 
making level. The faith level is used for acquisition, 
assignment, and update of belief, belonging to static portions of 
the model. The decision-making level transfers the belief into 
decision probability and makes decision, belonging to dynamic 
portions of the model. To measure the belief, Smets introduced 
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an inadequate reasoning principle to make the belief distributed 
on no imformation. Set the probability function on the faith 
level as betP, then 


m(A) 


betP (x,m) = py m 


xeA 


(7) 


Then the belief distribution can be gotten from the linear 
system of equation. The transferable belief model is 
independent of probability theory, but for the faith that 
comes from the game, this model inevitably faces the 
prisoner’s dilemma"”, 

In addition, Zadeh”", Dubois and Prade?”, and Pawlak!” 
are also committed to explain the evidence theory. 

The above models explain D-S evidence theory from the 
sources of evidence theory, the conditions of focal elements, 
the reliability of evidence, and reliability updating. These 
models focus on different aspects, making their applicable 
scopes different. The upper and lower probability 
interpretation model is suitable for the application that the 
prior knowledge does not meet the probability of additive, 
but when all focal elements are independent, the fusion 
result of D-S evidence theory is more superior than that of 
Bayesian method. The random decoder method more suits 
for the environment that the evidence is simple and the 
reliability can be clearly distinguished. The transferable 
belief model is not bound by the probability function, but 
the faith from the game brings prisoner’s dilemma. In the 
actual application, we need to choose the most suitable 
model according to the practical problem. 


3. Algorithms of Evidence 
Combination 


The most intuitive shortage of D-S evidence theory is the 
tremendous calculation from focal elements. In general, n 
elements in framework © often bring 2—1 focal elements. If 
there are 20 elements in framework ©, there are 
1.048576x10’ possible focal elements. To solve this problem, 
there are two main ways: fast algorithm of a special evidence 
structure and approximation algorithm of decreasing the 
number of focal elements. 

Barnett”! designed a fast algorithm for a simple evidence 
structure that the evidence supports a hypothesis or not. For 
evidence reasoning problems that the evidence space can be 
expressed as tree shape hierarchies (such as medical diagnosis), 
Gordon and Shortliffe™! designed another fast D-S algorithm. 
Pearl! using Bayesian inference in the hypothesis space 
simplified the calculation process and the amount of 
calculation. But Shafer” found in the highly conflict evidence, 
the result of calculation error is large, so they improved the D-S 
method and gave a precise algorithm under the hierarchy 
condition. This kind of algorithm fully embodies the Dempster 
synthesis rules, and the calculation result is relatively accurate, 
but the application scope is narrow. 

The approximation algorithm is the most efficient method 
for inducing focal elements. Voorbraak””! found using 


Bayesian approximation to replace the reliability function 
would not affect the result of the synthesis of Dempster’s 
tule, and proved that the reliability function of Bayesian 
approximate synthesis is equal to the combination of 
Bayesian reliability function approximation, this method 
greatly reduces the amount of calculation. Consonant 
approximation was proposed by Dubois and Prade", 
Elements calculated by this method are nested and less than the 
assumption of the recognition framework. Consonant 
approximation is good at evidence expression, but often brings 
large error, so it is not suitable for practical applications. 
Tessem” chose the focal elements of big masses to 
approximately calculate and put forward the (k, J, x) 
approximation method. The (k, l, x) approximation method is 
especially suitable for fast rule strength calculation, it not only 
improves the speed of evidence synthesis but also basically 
does not affect the decision of the mass functions. Simard et 
al! suggested the truncated D-S algorithm. It always keeps 
the basic probability assignment of “do not know” not zero, 
namely not depriving existence of the after arrival focal 
elements, readjusts the m(@) after each synthetic value, and 
retains the basic probability of focal elements after the 
“trim”. The algorithm has the advantages of both reducing 
the computation and ensuring the adaptability of the 
algorithm; the biggest shortage is that the evidence 
synthesis order has an impact on the result of the 
calculation. 

In a practical application, the Bayesian approximation 
method and (k, l, x) approximation method, in essence, are 
the conversion of BPAs to Bayesian probability. The 
difference is how to transfer “not sure” and “do not know” 
approximate BPAs into “ok” and “know” probabilities. In 
some sense, the Simard approximate algorithm is closer to 
the “style” of the conventional D-S method. 

Inspired by Pignistic probabilities convert, Burger and 
Cuzzolin” put forward two kinds of k-additive BPAs. The 
hierarchical clustering method was put forward by Denoeux""! 
to realize the approximation of inner and outer BPAs. The 
hierarchical mass distribution method was proposed to achieve 
the BPA approximation™!, Han et al.""! used the distance of 
evidences and uncertainty measurement to optimize the BPA 
approximation. 

The fast algorithm of a special evidence structure and the 
method of inducing focal elements have different application 
environments, advantages, and disadvantages, the principles of 
choosing a suitable algorithm in a data fusion system include 1) 
the number of focal elements, 2) the distribution of mass 
functions, 3) how many mass functions to synthesis, 4) the 
form of the initial reliability function (a Bayesian reliability 
function, a belief function, or a simple support function), and 5) 
the method used for the express of evidence or automatic 
decision-making. 


4. Conflict during Evidence Combining 


The D-S evidence theory is an important tool for 
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uncertainty reasoning. In evidence theory, the famous 
commutative and associative Dempster’s rule is used for 
evidence combinations, when all the sources are considered 
equally reliable. Although Dempster’s rule of combination 
is well-founded theoretically, its lack of robustness is 
considered as a limitation by researchers in this field, This 
is because counterintuitive results are obtained in some cases, 
especially when there is a high conflict among bodies of 
evidence. 

Suppose the discriminate framework ©={A, B, C} and the 
BPAs of the two evidence are m,(A)=0.99, m,(B)=0.01, 
m,(C)=0, m,(A)=0, m(B)=0.01, and m(C)=0.99. 

From Dempster formula, the conflict 
K=m(¢)=0.0099+0.9801+0.0099=0.9999. 

The fusion results are m,,(4)=0, m,,(B)=1, and m,(C)=0. 

Although the support degrees to B of m, and m, are 
comparatively very low, but the fusion results think B is true. 
This is obviously perverse. Such results are harmful to 
decision-making. 

There exist two major viewpoints on the so-called 
counterintuitive combination results. The first is that the 
counterintuitive results are due to Dempster’s rule of 
combination, especially its normalization step. Thus, a 
number of researchers have proposed alternative 
combination rules! that use various strategies to 
redistribute the conflict and provide a fusion tool that produces 
results that match expectations, such as Yager rules, Lefervre 
method, DP rules, Quan Sun allocation method, Shanying 
Zhang allocation method, etc. The second viewpoint is that the 
counterintuitive results are due to the evidence that is 
combined, i.e., the data model According to this 
viewpoint, there are no counter-intuitive behavior results 
from the use of Dempster’s rule of combination, and the 
mass functions should be regenerated or modified before 
combination occurs, such as discount coefficient method, 
Murphy average method, Jousselme method, etc. 

Alternative combination rules are designed for the conflicts 
assigned on total evidence, and all conflicts will be allocated to 
all propositions on proportional. Yager"! suggested that the 
conflicts are the root cause of the failure, all conflicting 
evidence is unable to provide effective information, so he 
assigned all conflicts to unknowns m(@). The improved formula 
can be used in high conflicting evidence combination, but the 
irrational distribution will lead to unreasonable results for 
assigning all conflicting evidence to the unknown. Lefevre et 
al." thought conflicting information cannot be completely 
abandoned. We should extract and analyze the conflicts, then 
add the combination rules to get the new combination rule, and 
finally put forward the unify reliability function combination 
method. Dubois and Prade!” assigned the value of the mass 
function to all conflicting focal elements, but since there is no 
distinguish between different focal elements, the composite 
result is more uncertain. Smets'*! believed that the 
counterintuitive combination is the result of the uncompleted 
recognition framework, so they treated an empty set as the 
unknown elements and assigned all conflicts unknown. These 








measures 











methods change the close of evidence theory and bring more 
problems. Sun" thought all evidence credibility is congruent, 
defined the validity of the evidence coefficient through 
calculating the average of two conflicts, and gave all conflicts 
in proportion to each proposition. 

In addition, Martin and Osswald™, Smarandache and 
Dezert""!, Deng et al., and Zhang et al.“ proposed improved 
algorithms of evidence combination, but most of these methods 
only meet the specific application background. All of them pay 
too much attention to the allocate space and proportion of the 
conflicts but neglect the cause for the evidence that is 
unreliable. 

Some methods focusing on correcting the source of 
evidence have been given. Haenni! suggested that the 
combination rule of D-S theory has a solid mathematical basis 
and is the promotion of Bayesian method. When the evidence 
is conflicting, the source of evidence should be modified. In 
order to solve this problem, Shafer’! put forward a general 
discount coefficient method, however, in practical applications, 
the reliability of information is different, and the discount factor 
will also change. Murphy" calculated the average of all 
evidence credibility before evidence fusion, but he ignored the 
credibility of the evidence and the correlation between 
evidence, the combination results are not ideal. For this 
phenomenon, Xu ef al.“® introduced an effective factor to 
measure the reliability of the evidence sources. Liang et al." 
introduced the concept of experts, but these values need to 
obtain a priori knowledge, so the method is not universal. In 
addition, Deng et al." and Ding"! also proposed the method of 
correcting source. Evidence source revisions speed up the 
convergence speed of the evidence synthesis and increase the 
synthesis of reliability, but are easy to cause the losing of 
information. 

As the methods of correcting synthesis and modifying the 
source of evidence are hard to get general and reasonable 
applications. Recent years many scholars! began to put 
forward a combination method of these two methods. They tried 
to take advantage of both so to obtain a more reasonable 
method. But most of these synthesis methods’ theoretical 
basis is insecure, which only can be applied to specific 
examples, so it is very difficult to find a truly universal and 
reasonable fusion method. 

Although both types of viewpoints are rational, we 
prefer the idea that the unreliable source is the cause for the 
counterintuitive results. One necessary condition for using 
Dempster’s rule of combination is that all the sources are 
equally reliable. However, in many real applications, all the 
sources of evidence to be combined may not have equal 
reliability. Therefore, we think that the correcting of evidence 
sources to be combined should be modified according to the 
reliability of their sources, providing a correct assessment of the 
given problem. The effects of the evidence from more reliable 
sources should be strengthened, and at the same time, the 
effects of the evidence from less reliable sources should be 
weakened. 
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5. Relationship of D-S Evidence 
Theory and Probability Theory 


Form Section 1 we know, the D-S evidence theory comes 
from probability and has a very close relationship with 
probability theories. From Section 2, we know one of the four 
main explanations of D-S evidence theory argues that when the 
BPA is defined on a single subset, the BPA is degraded into 
probability, as shown in the following example, this is not true. 

Suppose @={6,, 6, O}, the BPAs are m({0,})=0.2, 
m({9;})=0.2, and m({43})=0.6. 

The probabilities are p({0,}=0.2, p({@})=0.2, and 
P({83})=0.6. 

Now we consider whether p(-)=m(-), by the additivity of 
probabilities p({O,, 4} =p} )*PC 2} )=0.4. 

And for the countable additivity of certain probabilities: 
POF) PRA + p8). 

All masses of the focal elements except 6, 6», and 0; are 0. 
So we have m({O,, 6:})=0#m({0,})+m({62})=0.4, m({O})= 
O#m({O,})+m({O,}) +m} )=1. 

To obtain the difference more intuitive, the difference 
between p(-) and m(-) is shown in Table 1. 

















Table 1: Difference between p(-) and m(-) 








PC) mC) 





p(0,)=0.2 m({0,})=0.2 

p(0,)=0.2 m({0,})=0.2 

p(0s)-0.6 m\({03})=0.6 

PO, VO,)=p(O,)+p(8,)=0.4 m({O, U 0,})=04m({O,})+m( {0} )=0.4 
pl m({O})=0 








Form the Table 1, we know even the BPA is defined on a 
single subset, the BPA is not satisfying additivity and 
m({©})=1, so the BPA is not equivalent to the probability. The 
BPA is similar to the probability only on formal. On the other 
hand, the D-S evidence theory can be viewed as an imprecise 
probability method when the proposition is profiled by upper 
and lower probabilities, because the probability interval is 
similar to the belief interval [Bel(A), PI(A)]. 

From Section 2, we know some researchers argue that 
when the BPA is defined on a single subset, the Dempster’s 
tule is equivalent to Bayes formula. We give an example to 
compare Dempster’s rule and Bayes formula. 











Set framework ©={4,B}, P(A)=0.6, P(E|A)=0.8, 
P(E|B)=0.2, from the Bayes formula, we get: 
E P(E|A)P(A) 
EEN P(E|A)P(A) + P(E|B)P(B) 
0.8 x 0.6 
= = 0.86 
0.8 x0.64+0.2 x 0.4 
P(E|B)P(B) 
P(BIE) = —————_—_—___. 
IE) P(E|B)P(B) + P(E|A)P(A) 
_ 0.2 x0.4 =0.14. 
0.2x0.4+0.8x 0.6 


If we transfer above-mentioned evidence into single-point 
BPAs, we have m,(A)=0.6, m,(B)=0.4, m,(A)=0.8, and 
m,B)=0.2. 

From the Dempster’s rule, we have m,,(A)=0.86 and 
m,(B)=0.14. 

The results are same, but it is unable to specify that 
Dempster’s rule is equal to Bayes formula. First, in 
Dempster’s rule, all evidence is equal, our example viewed 
the prior probability and likelihood function as two independent 
evidence, it is not reasonable. The second reason is that the 
example is based on a strong implicit assumption which is 
P(E\A)+P(E|B)=1. This assumption is not a necessary condition 
in Bayes formula, but for BPAs, it is necessary. 

In a word, the D-S evidence theory is an inexact promotion 
of probability. 


6. Conclusions 


Because the D-S evidence theory has the following three 
requirements, it will not actually achieve expected results: 1) 
The evidence must be independent, and sometimes it is not 
easy to meet. 2) There needs a tremendous computing 
workload during evidence combination. 3) The counterintuitive 
combination results in evidence combination. In recent years, 
scholars have made a lot of work on 2) and 3). But for 1), no 
breakthrough appeared. From the developing of the current D-S 
evidence theory, the related theories, such as fuzzy set 
theory", random set theory, rough set theory”, analytic 
hierarchy process’, and neural network analysis”, are used to 
explain and optimize the results of D-S evidence theory. 

In addition, the D-S evidence theory is a form of random 
sets theory, but the random sets theory lacks statistical 
techniques. The essence of BPAs is the distribution of 
random variables, and the Dempster’s rule is the compute 
tule of random sets. Both of these are dependent on the 
study of random sets theory. So in order to expand the 
application of D-S theory, the best way is enriching the 
study of random sets theory. 

In terms of applications, the D-S evidence theory has been 
used in intelligent identification systems!” fault diagnosis") 
human resource management”, risk assessment’! decision- 
making evaluation", etc. With the research deepening and 
some key problems’ solving, its applications will be more 
widely. 
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